12 research outputs found

    Multi-Source Multi-View Clustering via Discrepancy Penalty

    Full text link
    With the advance of technology, entities can be observed in multiple views. Multiple views containing different types of features can be used for clustering. Although multi-view clustering has been successfully applied in many applications, the previous methods usually assume the complete instance mapping between different views. In many real-world applications, information can be gathered from multiple sources, while each source can contain multiple views, which are more cohesive for learning. The views under the same source are usually fully mapped, but they can be very heterogeneous. Moreover, the mappings between different sources are usually incomplete and partially observed, which makes it more difficult to integrate all the views across different sources. In this paper, we propose MMC (Multi-source Multi-view Clustering), which is a framework based on collective spectral clustering with a discrepancy penalty across sources, to tackle these challenges. MMC has several advantages compared with other existing methods. First, MMC can deal with incomplete mapping between sources. Second, it considers the disagreements between sources while treating views in the same source as a cohesive set. Third, MMC also tries to infer the instance similarities across sources to enhance the clustering performance. Extensive experiments conducted on real-world data demonstrate the effectiveness of the proposed approach

    Online Unsupervised Multi-view Feature Selection

    Full text link
    In the era of big data, it is becoming common to have data with multiple modalities or coming from multiple sources, known as "multi-view data". Multi-view data are usually unlabeled and come from high-dimensional spaces (such as language vocabularies), unsupervised multi-view feature selection is crucial to many applications. However, it is nontrivial due to the following challenges. First, there are too many instances or the feature dimensionality is too large. Thus, the data may not fit in memory. How to select useful features with limited memory space? Second, how to select features from streaming data and handles the concept drift? Third, how to leverage the consistent and complementary information from different views to improve the feature selection in the situation when the data are too big or come in as streams? To the best of our knowledge, none of the previous works can solve all the challenges simultaneously. In this paper, we propose an Online unsupervised Multi-View Feature Selection, OMVFS, which deals with large-scale/streaming multi-view data in an online fashion. OMVFS embeds unsupervised feature selection into a clustering algorithm via NMF with sparse learning. It further incorporates the graph regularization to preserve the local structure information and help select discriminative features. Instead of storing all the historical data, OMVFS processes the multi-view data chunk by chunk and aggregates all the necessary information into several small matrices. By using the buffering technique, the proposed OMVFS can reduce the computational and storage cost while taking advantage of the structure information. Furthermore, OMVFS can capture the concept drifts in the data streams. Extensive experiments on four real-world datasets show the effectiveness and efficiency of the proposed OMVFS method. More importantly, OMVFS is about 100 times faster than the off-line methods

    Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial

    Get PDF
    Objective It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Methods We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression. Results Article pairs from the same trial were identified with high accuracy (F1 score = 0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial. Discussion Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial

    Nuggets: findings shared in multiple clinical case reports

    No full text

    Improving Soil Enzyme Activities and Related Quality Properties of Reclaimed Soil by Applying Weathered Coal in Opencast-Mining Areas of the Chinese Loess Plateau

    No full text
    There are many problems for the reclaimed soil in opencast-mining areas of the Loess Plateau of China such as poor soil structure and extreme poverty in soil nutrients and so on. For the sake of finding a better way to improve soil quality, the current study was to apply the weathered coal for repairing soil media and investigate the physicochemical properties of the reclaimed soil and the changes in enzyme activities after planting Robinia pseucdoacacia. The results showed that the application of the weathered coal significantly improved the quality of soil aggregates, increased the content of water stable aggregates, and the organic matter, humus, and the cation exchange capacity of topsoil were significantly improved, but it did not have a significant effect on soil pH. Planting R. pseucdoacacia significantly enhanced the activities of soil catalase, urease, and invertase, but the application of the weathered coal inhibited the activity of catalase. Although the application of appropriate weathered coal was able to significantly increase urease activity, the activities of catalase, urease, or invertase had a close link with the soil profile levels and time. This study suggests that applying weathered coals could improve the physicochemical properties and soil enzyme activities of the reclaimed soil in opencast-mining areas of the Loess Plateau of China and the optimum applied amount of the weathered coal for reclaimed soil remediation is about 27?000?kg?hm-2.There are many problems for the reclaimed soil in opencast-mining areas of the Loess Plateau of China such as poor soil structure and extreme poverty in soil nutrients and so on. For the sake of finding a better way to improve soil quality, the current study was to apply the weathered coal for repairing soil media and investigate the physicochemical properties of the reclaimed soil and the changes in enzyme activities after planting Robinia pseucdoacacia. The results showed that the application of the weathered coal significantly improved the quality of soil aggregates, increased the content of water stable aggregates, and the organic matter, humus, and the cation exchange capacity of topsoil were significantly improved, but it did not have a significant effect on soil pH. Planting R. pseucdoacacia significantly enhanced the activities of soil catalase, urease, and invertase, but the application of the weathered coal inhibited the activity of catalase. Although the application of appropriate weathered coal was able to significantly increase urease activity, the activities of catalase, urease, or invertase had a close link with the soil profile levels and time. This study suggests that applying weathered coals could improve the physicochemical properties and soil enzyme activities of the reclaimed soil in opencast-mining areas of the Loess Plateau of China and the optimum applied amount of the weathered coal for reclaimed soil remediation is about 27?000?kg?hm-2
    corecore